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Abstract. We revisit Fermat's factorization method for a positive inte- 
ger n that is a product of two primes p and q. Such an integer is used as 
the modulus for both encryption and decryption operations of an RSA 
cryptosystem. The security of RSA relies on the hardness of factoring this 
modulus. As a consequence of our analysis, two variants of Fermat's ap- 
proach emerge. We also present a comparison between the two methods' 
effective regions. Though our study does not yield a new state-of-the- 
art algorithm for integer factorization, we believe that it reveals some 
interesting observations that are open for further analysis. 
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1 Introduction 

Integer factorization is a classic problem in computational number theory. Given 
a positive integer n, factorization yields two positive integers a > 1, b > 1, such 
that n — ab. With the advancement of digital computers, there has been a 
considerable progress towards solving this problem in recent times. RSA cryp- 
tosystem 8 utilizes the concept of trapdoor functions for developing a public 
key encryption technique and is based on the fact that it is easy to multiply 
two large prime numbers but it is extremely difficult to obtain these primes by 
factoring their product. 

Factoring algorithms are of two types: special purpose and general purpose 
algorithms. The efficiency of special purpose algorithms depends on the unknown 
factors, whereas the efficiency of the latter depends on the number to be fac- 
tored. Some of the most important special purpose factoring algorithms are: 
trial division, Pollard's rho method [5], Pollard's p — 1 method [B], the elliptic 
curve method pQ, Fermat's method [3], squfof [7] etc. Quadratic sieve [2] is a 
generic approach for developing general purpose algorithms. The most efficient 
general purpose algorithm known so far is the number field sieve [3] . Special pur- 
pose algorithms perform well for numbers with small factors, unlike the numbers 



used in the RSA. Therefore, general purpose factoring algorithms are the more 
important ones in the context of cryptographic systems and their security. 

In this paper, we revisit Fermat's factorization from a new perspective. Fer- 
mat's method for factoring an odd integer n consists of finding n = x 2 — y 2 
where x and y are integers. One finds in succession x — \n ], |Yt ' 5 ] + 1, . . . 
and determines whether the difference x 2 — n is a square or not. If p and q are 
primes and n = pq, then Fermat's method is quite efficient if 2 is near 1, but it 
requires a large number of trials if ^ is not near 1. However, in the later stages 
of this paper, we will prove that our approach provides better results than Fer- 
mat's method in certain regions even though we based our initial approach on 
the latter. 



2 The Basic Method 



Let n = pq where p and q are large primes separated by at least a considerably 
large distance. 




Let Xq = \y/K\ , Pq = X 2 -n,X c = X + c, P c = X 2 - n, 
where c > is an integer. Note that Xq and Pq are both positive integers and 
fixed for a particular value of n. Since X 2 — n = (Xo + c) 2 — (X 2 — Pq) = 
c 2 + 2X c + Pq, we have 

P c =c 2 + 2X Q c + P . (2) 

Now, if we compare n = X 2 — P c with Equation [1] we can say that we 
need to make P c a perfect square. Then, P c = (^rp) 2 and X c — ^j 2 . Hence, 
p = X Q + c + Vc 2 + 2Xqc + Pq and q = X + c - Vc 2 + 2X c + P . 

Thus, the smallest possible value of P c (= c 2 + 2Xqc + Pq), which is a perfect 
square, gives us the ability to factorize large integers. 



3 Further Analysis and a New Method 

In Equation [2j the required value of c is such that P c is a perfect square. So, it 
can be assumed that P c — (c + a) 2 where a € N. From Equation^ we have 
(c + a) 2 = c 2 + 2X c + Pq. Therefore, 

(3) 



2(X - a) 



Since, c > 0, {a 2 — Pq) and {Xq — a) must both be positive or negative 
together. 

1. When both are negative, then a 2 < Pq and a > Xq, i.e., < a < LV^jJ 
and a > Xq (since, a > 0). Since LV^)J "C X , the ranges are disjoint. So 
we have a contradiction. There is no possibility of (a 2 — Pq) and (Xq — a) 
being negative together. 



2. When both are positive, then a 2 > Pq and a < Xq, i.e., \s/Pq | < a < Xq 
(since, a > 0). Hence, the range is found to be acceptable. So the range of 
a is \VPq] <a<X . 



3.1 Nature of a 

From Equation[3J it is evident that (a 2 — Pq) must be even. So, a and Pq should 
be both even or both odd. Also, since for RSA type values of n (n odd), X and 
Pq are opposite in nature (here nature refers to oddness or evenness) , a and Xq 
are opposite in nature and (Xq — a) is always odd. 

So, the number of test values of a is actually halved since, for a particular 
value of n, a's nature is fixed. We rewrite Equation [3] as 

_ a 2 -P _ a2 - x o I X Q- p o _ Xn+a , n 

2{X a -a) 2{X -a) 2(X -a) 2 2{X -a)- 

In the above equation, since c S N, then (Xo — a) must divide n. Since, we 
already know n — pq, where p and q are prime numbers, {Xq — a) is the smaller 
of the two prime numbers. 



3.2 Test Case Reduction of a. 

We know that the range of a is [v^Tl < a> < X and as discussed in the previous 
section, the number of test cases in the range is halved. Thus, *o-i-|V7¥1 ig the 
number of test cases. Also, we know that (Xq — a) is the smaller of the numbers 
p and q. So, judging by the value of n, we can predict the value of last digit of 
the two factors. Thus, we can actually guess the last digit of a. 

Now, from to 9, there are five odd and five even numbers. Depending upon 
the nature of a, five numbers qualify for being the last digit in any test case 
value of n. But (Xq — a) can never have 5 as its last digit. Hence, we can say 
that number of possible last digits of a is 4. This is the maximum condition for 
number of valid values of a's last digit. 

So, the number of test values of a is actually reduced to 4 ( x °~ 1 ~rv / 7^1) 

2(X -l-|-y/Tbl) 
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3.3 Nature of a-c Curve 

Equation [3] represents c as a function of a. In this section, we try to explore the 
nature of this function. This information is essential in the later sections of this 
paper. 

Wehavec = /(a) = 5^y. 
Differentiating f(a) w.r.t a, we get 

n«) = - q2 + 2Xo "; Po (4) 

Z(Xq - ay 

We can write f'{a) = 2{x *_ a) + 2 fl"Z^ and thereby it follows that f'(a) 
is always positive in the range [V-^o | < a < Xq. Hence, c = f(a) is a monoton- 
ically increasing function in the derived interval. 



3.4 An Alternative Approach 



Here we present an alternative approach of analysis for the choices of c. Further 
analysis of this approach needs to be done to study its usefulness. We rewrite 
Equation [3J as 



I a \2 P( 



(5) 



2(X Q -a) 2 1 - ^ 

Let ^ = p and -^r — k. Since, |~VPcT| < a < Xo, we have \fk < p < 1. Also, 
< P < X 2 - <X - l) 2 , i.e., < P < 2X - 1. Therefore, k = Q < < 
^$-> i-e., k < Hence, < k < For a particular value of n, k is always 
constant, k is typically very close to 0. 

inn 1^1 Tiro ceai- r> — ti ( n\ — . , 

2 1-p 



Replacing these variables in Equation we get c = fk(p) ~ — ■ - — - 

f (fEf-(l+P))- 

Since, we already know that c is a positive integer, there might be some scope 
here for predicting the approximate value of p for which c becomes an integer. 
Further mathematical exploration is essential for extracting any meaningful re- 
sults from this approach. 



3.5 Practical Range Determination for a 

In the beginning of Section [3J wc deduced the range of a for which large integer 
factorization is possible. So far, primality of the factors have not been considered. 
In this section, we further explore the concept in the context of RSA cryptosys- 
tem. We assume n = pq, q < p < 2q, where p and q are two prime numbers. In 
Section [2j we derived that 

p = X c + ^J~P C and q = X c - y/P~ c , where P c = c 2 + 2X c + P . 
Using these expressions in the relation q < p < 2q, we get 

X c - VP~c < X c + JT C < 2(X C - ^P~ c ). 
That is, -VK < \fP c <X C - 2yfP~ c . 

Now, P c > Pq as c > 0, and hence we can say \/P~ c > \/P~o. 
So, the relation becomes \/Po < \fP~ c < X c — 2\fP~ c . 

Again, ^JP~ C < X c - 2%fP c <^ i\/P~ c < X c ^ 9P C < A 2 (squaring both sides) 

<S> 9(c 2 + 2X c + P ) < (X + cf c 2 + 2X c + P - f < (Since, 
X 2 -P =n). 

Boundary values of c = — Xo ± y/n~+~^- Since, c > 0, we have 

< c < -X + & < c < -X + 1.061X (considering y/E as X ). 
Hence, < c < O.O6IA0. The range is approximate in calculation. 
From Equation [3J we get 

2ffe^y < o-oeixo 

a 2 + 0.122X o a - (P + 0.122X, 2 ) < 0. 

Boundary values of a = — 0.061X ± X ■ \/0. 125721 + k (as assumed in Sec- 
tion 13.41 — k which is constant for a particular value of n) . a has already 



been proved to be > \VPo\- Hence, \^/Po] < a < X • (VO. 125721 + k — 0.061). 
We know that k < and so its value is very small. Thus, it can be considered 
that V0.125721 + fc < 0.36 
Hence, 

rV^l < a < °- 3 ^o. (6) 
The range is approximate in calculation. 



4 Comparative Study of the Two Methods' Effective 
Regions 

So far we have discussed two different approaches. In the first one, we try to 
calculate the value of c through several iterations. Let us designate this method 
as c-method. In the second one, we try to calculate the value of a through 
several iterations. Let us call this method the a-method. The objective here is to 
determine the region of effectiveness of the two methods. Let us first summarize 
the steps involved in both the methods. 



1. c-method: c 2 ,2X c, c 2 + 2X c,c 2 + 2X c + P 0} x = y 'c 2 + 2X c + P ,y 
j — \ x\ , Increment c until y = 0. 

2. a-method: a 2 , a 2 — Pq, 2a, 2X Q — 2a, x — 2 ° x ~ p ° ^ , y = x — [x\ , Increment a 
until y = 0. 

Thus, in terms of number of steps, both methods are equally effective. In 
general, comparative effectiveness can be determined by calculating the mini- 
mum number of consecutive iterations of c needed to match the progress of a 
certain number of iterations of a. We know from Section 13.21 that out of every 
10 consecutive integer values of a, we need to consider only four of them. Since, 
in the c-method, we increment c by 1, we can say that 4 s > 0.4. Hence, from 
Equation IU we get 

2(X -a)< ! — 

^ 9a 2 - 18X a + 9P + A{X% - P ) < 
O a 2 - 2X a + (P + 4f) < 0(smce, Xl - P = n). 

r 2 r> \ 4n v -L. / 5n 



Boundary values of a = X ± J (X 2 - P ) - ^ = X ± ^ 

w X Q ± X ^ (considering y/n as X ) « X (l ± 0.745). 
i.e., 0.255X < a < 1.745X . 

Since, we know [v^Tl < a < Xq, hence, a-method is more effective in 
the region 0.255Xo < a < Xq and c-method is more effective in the region 
\VPo] <a< 0. 255X . 

From Equation [5] we can say that for RSA-specific purposes, a-method is 
more effective in the region 0.255Xo < a < O.SXq and the c-method is effective 
in the region [v^l < a < 0-255Xq. The ranges are approximate in calculation. 



5 An Illustrative Example 



Let us consider a 264-bit integer 

n = 2475816795965452800715637453191546408183976093553221868368970 
8649238085888673119 with factors 

p = 6847944682037444681162770672798288913849 and 

q = 3615415881585117908550243505309785526231. 
We have X = 4975758028647949436694003969298664117473, 

Po = 3171681298218633703780106501840055232610, 

c = 255922253163331858162503119755373102567, 

a = 1360342147062831528143760463988878591242. 
The lower bound of the interval in which the a-method gives better results is 

0.255X = 1268818297305227106356971012171159349956. 
Number of iterations for the a-mcthod is 

z = a - 0. 255X = 91523849757604421786789451817719241286. 
Considering sieving, the number of iterations is 

0.4z = 36609539903041768714715780727087696514. 
Thus, the difference between the number of test cases for the c- method and the 
a-method is c - 0.4z = 219312713260290089447787339028285406053. 
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